Skip to content

binary: fix string parsing on big-endian hosts#6605

Open
amaanq wants to merge 2 commits intoKhronosGroup:mainfrom
amaanq:fix-big-endian-strings
Open

binary: fix string parsing on big-endian hosts#6605
amaanq wants to merge 2 commits intoKhronosGroup:mainfrom
amaanq:fix-big-endian-strings

Conversation

@amaanq
Copy link
Copy Markdown

@amaanq amaanq commented Mar 17, 2026

Note

The fixes are split into individual commits to make reviewing easier

Problem

Two places in the codebase read SPIR-V string data incorrectly on big-endian hosts:

  1. binary.cpp: When parsing a SPIR-V binary with a different endianness than the host (e.g. a spec-conformant little-endian binary on ppc64/s390x), the parser reads string operands from raw _.words
    without byte-swapping first. MakeString then extracts bytes assuming native word layout, producing garbled strings — e.g. "OpenCL.std" reads as "nepOs.LC".

  2. extract_source.cpp: The objdump source extraction used reinterpret_cast<const char*> on parsed instruction words, which gives wrong byte order on big-endian hosts since SPIR-V strings pack
    characters starting from the lowest byte of each word.

Solution

  1. Byte-swap the words before passing them to MakeString when requires_endian_conversion is true, matching how other operand types are already handled via spvFixWord.

  2. Use MakeString instead of raw casts, which correctly extracts characters from the low bits of each word regardless of host endianness.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 17, 2026

CLA assistant check
All committers have signed the CLA.

@amaanq amaanq force-pushed the fix-big-endian-strings branch from 37f786b to 9103f43 Compare March 17, 2026 22:56
When parsing a SPIR-V binary with a different endianness than the host
(e.g. a spec-conformant little-endian binary on ppc64/s390x), the parser
reads string operands from raw `_.words` without byte-swapping first.
`MakeString` then extracts bytes assuming native word layout, producing
garbled strings, for example, "OpenCL.std" reads as "nepOs.LC".

Byte-swap the words before passing them to `MakeString` when
`requires_endian_conversion` is true, matching how other operand types
are already handled via `spvFixWord`.
@amaanq amaanq force-pushed the fix-big-endian-strings branch from 9103f43 to cafde04 Compare March 18, 2026 00:19
`extract_source.cpp` used `reinterpret_cast<const char*>` on parsed
instruction words to read string data. On big-endian hosts, bytes within
each native-endian word are in high-to-low memory order, but SPIR-V
strings pack characters starting from the lowest byte of each word.

Use `MakeString` instead of raw casts, which correctly extracts
characters from the low bits of each word regardless of host endianness.
@amaanq amaanq force-pushed the fix-big-endian-strings branch from cafde04 to 133e93b Compare March 18, 2026 00:26
@s-perron s-perron requested a review from dneto0 March 26, 2026 15:31
@dneto0
Copy link
Copy Markdown
Collaborator

dneto0 commented Apr 14, 2026

Thanks for your patience awaiting my review.

Hi, please take a look at the analysis and experiments at #5302 (comment)

There I show that the binary parser handles both big-endian and little-endian binaries.

I haven't looked at source extraction. Please provide examples showing there is a problem, and also provide tests with any suggested fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants